Building A Large Concept Bank for Representing Events in Video

نویسندگان

  • Yin Cui
  • Dong Liu
  • Jiawei Chen
  • Shih-Fu Chang
چکیده

Concept-based video representation has proven to be effective in complex event detection. However, existing methods either manually design concepts or directly adopt concept libraries not specifically designed for events. In this paper, we propose to build Concept Bank, the largest concept library consisting of 4, 876 concepts specifically designed to cover 631 real-world events. To construct the Concept Bank, we first gather a comprehensive event collection from WikiHow, a collaborative writing project that aims to build the world’s largest manual for any possible How-To event. For each event, we then search Flickr and discover relevant concepts from the tags of the returned images. We train a Multiple Kernel Linear SVM for each discovered concept as a concept detector in Concept Bank. We organize the concepts into a five-layer tree structure, in which the higher-level nodes correspond to the event categories while the leaf nodes are the event-specific concepts discovered for each event. Based on such tree ontology, we develop a semantic matching method to select relevant concepts for each textual event query, and then apply the corresponding concept detectors to generate concept-based video representations. We use TRECVID Multimedia Event Detection 2013 and Columbia Consumer Video open source event definitions and videos as our test sets and show very promising results on two video event detection tasks: event modeling over concept space and zero-shot event retrieval. To the best of our knowledge, this is the largest concept library covering the largest number of real-world events.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recognition of Visual Events using Spatio-Temporal Information of the Video Signal

Recognition of visual events as a video analysis task has become popular in machine learning community. While the traditional approaches for detection of video events have been used for a long time, the recently evolved deep learning based methods have revolutionized this area. They have enabled event recognition systems to achieve detection rates which were not reachable by traditional approac...

متن کامل

Recommendations for recognizing video events by concept vocabularies

Representing videos using vocabularies composed of concept detectors appears promising for generic event recognition. While many have recently shown the benefits of concept vocabularies for recognition, studying the characteristics of a universal concept vocabulary suited for representing events is ignored. In this paper, we study how to create an effective vocabulary for arbitrary-event recogn...

متن کامل

Measuring technological gap ratio of wheat production using StoNED approach to metafrontier

The aim of this paper is to use the concept of the metafrontier function to study the determination of efficiency differentials and Technological Gap Ratio (TGR) on wheat production in Khorasan Razavi province. In this study, we used the metafrontier function and group frontier based on the concept of Stochastic Nonparametric Envelopment of Data analysis (StoNED). The data used in this stud...

متن کامل

Reflective Teaching in the Context of a Video Club: Nurturing Professional Relationships and Building a Learner Community

The purpose of this study was to examine how four teachers used the seven processes of videotape analysis to develop an analytic approach and reflective thinking towards their teaching. The study was organized within video clubs and was used to describe the interactions among four teachers about their experiences at a language institute. Data were gathered through videotaped recordings of lesso...

متن کامل

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1403.7591  شماره 

صفحات  -

تاریخ انتشار 2014